-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pass to assign tiles to logical objectFifos #915
Conversation
@@ -605,6 +605,10 @@ void addAMDAIEObjectFifoLoweringPasses( | |||
passManager.addPass(createCSEPass()); | |||
passManager.addPass(createCanonicalizerPass()); | |||
|
|||
passManager.addPass(createAMDAIEAssignTilesPass()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this pass supposed to be added to the pipeline here? Thought you wanted to still do everything in SplitLogicalObjFifosForConnectionReusePass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pass is now run after SplitLogicalObjFifosForConnectionReusePass
, because everything done in that pass (and the one from @yzhang93 we will enable soon) might cause us to want to assign new tile locations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense
If you want to semi-ensure that 2 passes are always run back-to-back, you can use the aievec approach of adding to the pipeline with a helper function:
(minus) doesn't technically ensure a user doesn't misuse the pass |
Yeah, but the disadvantage is a real issue for me. A pass not being stable on itself is just an invite for bugs and misuse. The workaround hides the real underlying problem that needs to be tackled, which is ensuring that the output without tile assignment is stable on itself. I have some ideas for this, but I don't think it's a priority right now because the pass has held up pretty well as it stands and if needed, you can easily debug it with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM. Some minor comments.
@@ -20,7 +20,7 @@ | |||
// CHECK: aie.use_lock | |||
// Check a bit of the aiex.runtime_sequence: | |||
// CHECK: aiex.runtime_sequence @matmul_i32() | |||
// CHECK: } {npu_instructions = dense_resource<npu_instructions> : tensor<174xui32>, runtime_sequence_name = "matmul_i32"} | |||
// CHECK: } {npu_instructions = dense_resource<npu_instructions> : tensor<208xui32>, runtime_sequence_name = "matmul_i32"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the meaning of this tensor? Why does the value change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the control code instructions tensor. It was a bit weird that this changed, but I had a look at it and it seems like earlier channels were shared when they shouldn't have been and now they are not anymore, leading to more shim channels being used and therefore, more control code.
"device-specific information is required to determine when loops " | ||
"can be subsumed into DMA operations, and must be attached to a " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: the error message should be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the catch, updated it,
@@ -67,6 +67,11 @@ def AMDAIEAssignPacketIds : | |||
let constructor = "mlir::iree_compiler::AMDAIE::createAMDAIEAssignPacketIdsPass()"; | |||
} | |||
|
|||
def AMDAIEAssignTiles : Pass<"iree-amdaie-assign-tiles", ""> { | |||
let summary = "Assign physical tile locations to logical objFifos."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: objFifos -> objectFifos.
|
||
// ----- | ||
|
||
// Test duplicate global logical objFifos. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: maybe add note to indicate global logical objFifos are in L3?
return success(); | ||
} | ||
|
||
/// Utility to duplicate global objFifos for each strided copy-like operation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: global objFifos -> global logical objectFifos in L3
rewriter.setInsertionPoint(copyOp); | ||
auto newTarget = rewriter.create<AMDAIE::LogicalObjectFifoFromMemrefOp>( | ||
rewriter.getUnknownLoc(), | ||
cast<LogicalObjectFifoType>(target.getOutput().getType()), | ||
target.getMemref()); | ||
rewriter.replaceUsesWithIf( | ||
target.getOutput(), newTarget.getOutput(), [&](OpOperand &use) { | ||
return use.getOwner() == copyOp.getOperation(); | ||
}); | ||
} | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are almost duplicate codes as the above. Maybe extract these as a separate function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
AMDAIE::TileOp assignedTileOp = | ||
*std::min_element(tiles.begin(), tiles.end(), tileLocAndUsageCmp); | ||
|
||
// Increase usage of the chosen tile. | ||
int64_t col = getConstantIndexOrAssert(assignedTileOp.getCol()); | ||
int64_t row = getConstantIndexOrAssert(assignedTileOp.getRow()); | ||
tileLocToUsage[TileLoc(col, row)] += 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add more comments to explain these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, no blocking comments from me
tiles.erase(std::unique(tiles.begin(), tiles.end()), tiles.end()); | ||
for (AMDAIE::TileOp tile : tiles) { | ||
std::optional<int64_t> column = getConstantIntValue(tile.getCol()); | ||
if (!column) return tile.emitOpError() << "found non-constant column"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An error without resulting signal of pass failure. But this isn't new in this PR, so not requesting a change on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I changed this to notifyMatchFailure
.
parentOp->emitOpError() << "failed duplicating global object fifos"; | ||
return signalPassFailure(); | ||
} | ||
if (failed(verify(parentOp, true))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question 1: The additional prints and verifications: I know they're copied from another pass, but do you think maybe they're not needed here? Because they're not in other passes. They suggest to me that this pass is still not to be considered stable, is that accurate?
Question 2: I assume duplicateGlobalObjFifos
is something you explicitly do not want to use in DistributeCoresAndObjectFifos` ? If that's not the case, you could have moved this copied code into another util function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question 1: The additional prints and verifications: I know they're copied from another pass, but do you think maybe they're not needed here? Because they're not in other passes. They suggest to me that this pass is still not to be considered stable, is that accurate?
Yeah, good point, I think this part is actually stable and they can be removed.
Question 2: I assume duplicateGlobalObjFifos is something you explicitly do not want to use in DistributeCoresAndObjectFifos` ? If that's not the case, you could have moved this copied code into another util function.
The reason for this was mainly to keep the behaviour of DistributeCoresAndObjectFifos
as close as possible to what it was to avoid more test changes. Maybe this could be tried and changed in a follow-up.
@@ -67,6 +67,11 @@ def AMDAIEAssignPacketIds : | |||
let constructor = "mlir::iree_compiler::AMDAIE::createAMDAIEAssignPacketIdsPass()"; | |||
} | |||
|
|||
def AMDAIEAssignTiles : Pass<"iree-amdaie-assign-tiles", ""> { | |||
let summary = "Assign physical tile locations to logical objFifos."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let summary = "Assign physical tile locations to logical objFifos."; | |
let summary = "Assign physical tile locations to logical objFifos. Existing assignments will be ignored/replaced."; |
maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated it.
Adds a pass that assigns tiles to logical objectFifos based on tile usage by trying to to distribute logical objectFifos as evenly as possible across available tiles. The new pass is based on the tile assignment logic that was originally part of
DistributeCoresAndObjectFifos
and refactors the latter pass to use the extracted functions to keep the logic the same for now.Note that I am not completely removing the tile assignment functions from
DistributeCoresAndObjectFifos
for two reasons:DistributeCoresAndObjectFifos
before tile assignment is not stable, i.e. calling cse/canonicalize in betweenDistributeCoresAndObjectFifos
andAssignTiles
leads to different functional (incorrect) results.Until the output of
DistributeCoresAndObjectFifos
before tile assignment is stable, the tile assignment utility functions should be called within the pass itself and not be separated out.